Feb 3, 2004 2:12pm
I posted in 'text archive' forum, but the topic is really about the 'million books' project.
I took one of the books (Early Jazz) and tried converting it to PDF using Acrobat 6 with a variety of settings. Using JBIG2 "lossy" compression on the TIF file, it turned the 88 MB .tif file into a 7.34 MB .pdf, which you can browse at:
If you OCR the pages so there is searchable text as well as images in the same file, you get 16.7MB:
This compares to 33.9MB for DjVu.
I think you might need the Adobe Reader 6 to open these files:
The "reduce file size" using JBIG2 cleans up
text images a bit, since it reduces noise.
Anyway, I'm interested in helping (with software, etc.) if someone wants to pursue this for 'million books'; let me know.